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Abstract. A utility-function approach to optimal spatial sampling design is a 
powerful way to quantify what “optimality” means. The emphasis then should 
be to capture all possible contributions to utility, including scientific impact and 
the cost of sampling. The resulting sampling plan should contain a component of 
designed randomness that would allow for a non-parametric design-based analysis 
if model-based assumptions were in doubt. 
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1 Introduction 

We would like to express our appreciation to Gustavo da Silva Ferreira and Dani Gamer- 
man (hereafter, FG) for their paper on Bayesian preferential spatial sampling (Ferreira 
and Gamerman, 2015) and to the editor of Bayesian Analysis for the opportunity to 
contribute to the discussion. Building on the papers by Muller (1999) and Diggle et al. 
(2010), the authors give a Bayesian approach to choosing new sampling locations after 
initial data are assumed to have been obtained under preferential sampling. 


2 What is Fixed and What is Random? 

Let the initial sample be yx , obtained at preferential sampling locations x; note that we 
have emphasised dependence of y on x through the notation yx, but it is exactly the 
same as what FG notate as y. The observation locations x and the observations yx are 
known by the statistician designing the next phase of the study, and hence all criteria 
and inferences should depend on both x and yx- One can see this most clearly in FG’s 
definition of the Bayesian design criterion U{d), given at the beginning of FG-Section 4. 
However, the reader should notice that equations FG-(3) and FG-(4) do not emphasise 
conditioning on x, along with yx, something we assume is an oversight on the part of 
the authors. 

It helped us to augment the notation for the utility function from u{d,6,yd) to 
M(d, 0, yd; X, yx); and likewise we suggest that the expected utility be notated: 

t/(d;x,yx) = £'(u(d,6»,yd;x,yx)|x,yx), (1) 
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where d is considered fixed and the expectation is taken over [0,yd|x, yx]. When there 
are many “stakeholders” (e.g., in an environmental study), each coming with his/her 
own utility function, how can a single utility function be constructed? Le and Zidek 
(2006, Chapter 11) opt for one based on entropy. Do the authors have any other sug¬ 
gestions to build “compromise” into a utility function? 

Notation is really important in these complex situations, so in the case of the utility 
function defined by FG-(4), which involves the latent process (not the observations), we 
suggest that u be rewritten as: 


M(d,6>,Sd;x,yx); 

that is, Sd replaces yd in u. Depending on the context, u could be a function of the 
new observations, yd = ... ,y{dm)y, or of the corresponding latent process, 

Sd = {s{di),..., s^dm))'] recall that FG have defined y{-) to be a noisy, shifted version 
of the mean-zero latent process s(-). 

In the rest of our discussion, we follow the authors’ lead and use (1), albeit with our 
modified notation that emphasises dependence on x and yx- The utility-function ap¬ 
proach to optimal design is attractive, but it will only be truly useful when components 
that quantify “how much?” and “why?” are specifically included; see Sections 3 and 4 
for further discussion. 

As FG make clear, the process s(-), the sampling locations x = (xi,... ,Xn)', and 
the observations yx = (y(xi),..., y(xn))^ have a possibly complex joint distribution. 
Following Diggle et al. (2010), the authors put structure on this joint distribution by 
assuming FG-(l), FG-(2), and a log-Gaussian Gox process. From the point of view of 
sample survey design, the information in x and yx is comparable to what one would gain 
from a pilot study, but it requires knowledge of components of 9 in order to make the 
pilot study operational. The following suggestion seems compatible with the authors’ 
approach to optimal design via preferential sampling. 

It is hard to design a study if there is no knowledge from which to draw. In the 
pre-pilot phase, one might choose a simple random sample which, in the spatial con¬ 
text, means that observation locations are sampled uniformly from the spatial region of 
interest, A. We note that this corresponds to a degenerate case of preferential sampling 
where fi, the coefficient of s(-) in the log-intensity, is equal to 0, and we also note the 
presence of randomness in this pre-pilot phase. 

After gaining knowledge to make the pilot study operational, x and yx are obtained 
from preferential sampling, and again we note the presence of randomness in choosing x. 
Given x and yx, the next set of locations, d = (di,..., dm)', need to be chosen, for which 
there will be a corresponding (based on the latent vector Sd) yu- This is the problem 
considered by FG, and their solution follows closely the proposal of Muller (1999). But 
there is an important difference: Muller considers d to be a “design parameter” that he 
clearly treats as non-stochastic (fixed). We would like to ask FG the following question: 
If X is considered to be random in the pilot phase, why would d be treated as fixed in 
the main phase of the study? 
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The authors follow Muller’s (1999) proposal closely but, in our opinion, they lose an 
opportunity to build a sequential-sampling-design strategy that updates the posterior 
distribution for 6 and s(-) through random sampling from the distribution [d|x, yx]. This 
can be obtained from [?/(-)l'S(')] [x, d|s(-)] = [d|x, s(-)] [x|s(-)]. The second factor in 

the product is a log-Gaussian Cox process on A, and the first factor is presumably a 
log-Gaussian Cox process too, but on 7l\x. Can the authors comment on our suggestion 
that designed randomness be used to obtain d? 

The following section makes strong links between FG’s proposal and the survey¬ 
sampling literature. It also reinforces the general desire for a component of randomisa¬ 
tion in the design. 


3 Spatial Sampling Designs 

The development in FG (and in the literature that it refers to) frames the sampling 
problem as the selection of a set of points on a grid that “covers” the region of interest, 
A. From this perspective, it is a special case of a finite-population sampling problem, 
with the N grid points defining the population, and with two random quantities defined 
on these points. The first is the sample-selection process that results in x (from the 
pilot study) and d (from the main survey). The second is the latent process s(-), from 
which “noisy” observations yx and yu are taken at x and d, respectively. 

The aim of a spatial sampling design should be to specify a suitable procedure 
for making a draw from the distribution of d given x and yx; see our discussion in 
Section 2. What is meant by “suitable” depends crucially on the target of inference 
for the sampling exercise; FG make their target s(-) and to a lesser extent 9, and 
they assume that “suitability” can be characterised through a utility function u. Their 
optimal sample d is then the set of (presumably so far unsampled) grid points that 
maximise the expected value of this utility, where recall that the expectation is with 
respect to the joint distribution of 9 and Sd conditional on x and yx. 

The authors’ optimal-design procedure is explicitly model-based. Furthermore, the 
fact that selection of d depends on a log-Gaussian Cox process with intensity function 
that is a function of s(-) means that the sampling design is informative (Chambers and 
Clark, 2012, Section 1.4), which Diggle et al. (2010) and FG refer to as preferential 
sampling. That is, one cannot treat the realised value of d as ancillary when using 
the combined pilot-study and main-survey data to make inferences. There is a well 
developed theory in the sample-survey literature for the analysis of a sample collected 
via informative sampling; see Chambers et al. (2012). From this perspective, the use 
of a log-Gaussian Cox process as a model for x is equivalent to Poisson sampling with 
inclusion probabilities that depend on the values of s(-) over the grid defining the finite 
population. We would like to draw attention to an extensive literature on this type of 
sampling and its implications, including many Bayesian approaches; see Nandram et al. 
(2013). 

More complex informative-sampling methods have also been investigated, principally 
in the context of sampling spatially clustered populations; see Rapley and Welsh (2008) 
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for a Bayesian specification and, in the context of sampling on networks, see Thompson 
and Seber (1996). Awareness of this closely related literature would seem advantageous 
for further development of the ideas set out in FG’s paper. 

The main inferential paradigm in survey sampling is design-based inference, which 
assumes that y(-) is fixed, and all inference is relative to the distribution of d. More¬ 
over, the outcome of the pilot study (x and yx) is treated as fixed. Generally, the 
survey-sampling approach is based on frequentist inference about population summary 
statistics. The inference uses weights obtained from the randomisation in the design, 
along with the population values y(-) over the grid. In the simplest case, these weights 
are defined by the inverses of the inclusion probabilities for each of the elements of d, 
but more general “calibrated” weights are typically preferred; see Deville and Sarndal 
(1992). When informative (i.e. preferential) sampling is used, design-based inference, al¬ 
though theoretically still applicable, becomes problematic in practice. In order to carry 
out the survey sampling, one has to have access to the distribution of d, which depends 
on the latent process s(-). For design-based inference, one might try replacing s(') in the 
intensity function of the log-Gaussian Gox process with z(-), a spatial covariate whose 
value is known for every point on the grid and which is (hopefully) highly correlated 
with s(-). This is the model underpinning size-biased sampling; see Path and Rao (1978). 

A model-based approach seems therefore necessary under preferential sampling, such 
as assuming a spatial-statistical model for s(-). However, this does not mean that the 
basic design-based notions of randomisation, stratification, and clustering cannot be 
used in a preferential-sampling approach, since they are all useful tools that lead to a 
better representation of a heterogeneous population. In particular, what happens when 
the model FG-(l) and FG-(2) does not adequately describe the spatial variability in 
y(-) and s(-)? The optimality of d, and the validity of any consequent inference depends 
critically on the appropriateness of this model. This is clearly a weakness, should the 
design be for a highly scrutinised environmental study where scientists are worried not 
only about the environment but also about the team of lawyers waiting to litigate! Other 
problems arise when there are relatively few such choices of d, irrespective of the values 
of X and yx, all of whose utilities are comparable. 

We would like to reiterate that some form of randomisation in a design is always 
a good idea, because it offers protection against a biased (unintentional or intentional) 
choice of sample sites (e.g. Aldworth and Gressie, 1999). Perhaps more importantly, 
randomisation ensures that an updated fit of an assumed model for s(-) can be validly 
assessed from sample data and that replication-based ideas can be used for this purpose. 
And finally, when the parametric model is in doubt, the presence of randomisation allows 
the possibility that design-based inference could be used. 

As far as we are aware, there has been no work on “robustifying” inference based on 
data collected via preferential sampling, in order to make it less sensitive to model 
misspecification. Perhaps FG’s paper will stimulate such investigations. Recent re¬ 
search reported in Welsh and Wiens (2013) may provide an indication of how a robust 
preferential-sampling approach might work, with these authors developing an approach 
to sampling design that minimises the maximum prediction error in a neighbourhood of 
an assumed model for s(-). A related line of research concerns what could be termed as a 
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composite approach to preferential sampling, where a proportion of the sampling effort 
is randomly spread over the spatial grid, with the rest allocated to a more targeted 
preferential sampling design. An important research question here concerns how this 
allocation might be determined, based on the information in x and yx; this is discussed 
further in Section 4. 

We conclude this section by stating some basic elements of a good sampling design, 
be it spatial or not. A good design will stratify to ensure sampling over a range of levels 
of factors or a range of values of covariates. A good design will specify, in advance, 
inference thresholds and determine the number of observations per stratum needed to 
achieve those thresholds. Such designs create a rational basis for the inevitable com¬ 
promise between the cost of the study and the ability to make scientific inferences from 
incomplete and noisy data (e.g. Cressie, 1998; Zidek et ah, 2000). Finally, a good design 
will involve a component of designed randomness, from which non-parametric, design- 
based inference is also possible, should the model-based assumptions be in doubt. 


4 Utility Functions 

The process s(-) and the behaviour of the observations yx depend on parameters, which 
are denoted as 9. If 9 were known, then [x, yx, s(-)|^] = [yx|x, s(-), 0] [x, s(-)|0], and the 
predictive distribution is 

[s(-)|x,yx,6'] = [yx|x,s(-),6'] [x,s(-)|^'] / [x,yx|6'] • (2) 

Using the terminology of Cressie and Wikle (2011), an empirical hierarchical model 
(EHM) results if an estimate 9 is used in place of 9 in (2), and inference on s(-) is then 
based on the empirical predictive distribution, 

[s(-)|x,yx,0] = [yx|x,s(-),0][x,s(-)|0]/[x,yx|0]. (3) 

This EHM set-up is what Diggle et al. (2010) use, and they address the importance of 
making 9 a function of both x and yx. 

If there is uncertainty in 9 that can be expressed in terms of a prior probability 
distribution [0], then a Bayesian hierarchical model (BHM) results. Bayes’ Theorem 
yields the posterior distribution, 

[«(•)> 6'|x,yx] = [yx|x,s(-),6i][x,s(-)|6»][6i]/[x,yx]. (4) 

For a BHM, the Bayesian predictive distribution is the integral of (4) with respect to 
9, namely / [s(-), ^Ix, yx] d6>. 

The BHM is coherent in the sense that all inferences emanate from a well defined 
joint probability distribution. On the other hand, it requires specification of a prior [0], 
and it often consumes a large amount of computing resources. The EHM represents a 
compromise that may achieve computational efficiency. 

Diggle et al. (2010) do not address optimal spatial design in the way that FG do. 
If one sets about doing it, analogous to FG’s approach but within Diggle et al.’s EHM 
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framework, one would modify (1) so that the right-hand side would be the expectation 
taken over [yd|x, yx, and hence one would write the expected utility as U (d, 0; x, yx). 
Then the empirical utility is U{d, 0, x, y^) and, analogous to FG’s approach, one would 
find the EHM-optimal d by maximising U (d, 0; x, yx) with respect to d. Is there a more 
principled way to account for 0 (which is considered fixed but unknown) in the EHM 
framework? 

Let W = {d, 0, yd} denote all the unknowns in FG’s model. Let IF(x, yx) be one of 
many possible decisions about W based on x and yx- Some decisions are better than 
others, which can be quantified through a very general utility function that is bounded 
above, and which we denote as Z//(kF, IF(x, yx)); the utility function, u, used by FG 
represents a particular form of the more general U considered here. Note that U should 
account for “how much?” and “why?” and could be negative. Obviously, large utilities 
are preferred, and it is a consequence of decision theory (e.g. Berger 1985) that the 
optimal decision is: 

IE*(x,yx)=arg^sup {e(W(IE, l?(x, yx))|x, yx)} . (5) 

VV(x.yx) 

Now suppose that the goal is inference on g(IF), where g{-) is a known, scientifically 
interpretable, possibly multivariate function of W. The answer to this inference problem 
is found in the predictive distribution, [g(IF)|x, yx]. Let g denote a generic predictor of 
g{W). The mean of the predictive distribution of g{W), namely E(g(IF)|x, yx), is a com¬ 
monly used predictor, but this is just one of many possibly summaries of [ 5 (kF)|x, yx]. 

Why use the mean? Because it is straightforward to show that E( 5 (kF)|x, yx) solves 
(5) when the utility function is “negative squared-error,” —{g — g{Wjy{g — g{W)). How¬ 
ever, a negative squared-error utility assumes equal consequences for under-estimation 
as for over-estimation, which is not appropriate when g{W) represents extreme events, 
such as crop failure due to drought. 

Notice that we have written the utility as a function of all the unknowns, IF, and 
a decision about all the unknowns, IF. This gives us the opportunity to design for 
making inference on d simultaneously with making inference on 0, for example. Recall 
from Section 3 our discussion of the composite approach to optimal design. One of the 
components of 0 might be the derivative of the variogram of s(-) at the origin (a critical 
parameter for kriging), which we simultaneously want to infer along with predicting 
the hidden spatial process s(-). Laslett and McBratney (1990) give a composite spatial 
design that distributes sampling locations regularly over A (for inference on s(-)) and, 
around some of those locations, further locations are chosen very close together (for 
inference on 0). Do FG have any suggestions as to how an optimal composite spatial 
design might be obtained under their utility-function approach? 

In conclusion, we thank the authors for their stimulating paper, and we can see a 
number of very interesting research problems waiting to be solved. 
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